The Future of Neurotechnology Depends on AI

Blake Richards, associate professor at McGill University & core academic member of Mila - Quebec Artificial Intelligence Institute, discusses how deep learning and tokenization could unlock the potential of complex neural data. Thus paving the way for groundbreaking applications in brain-computer interfaces and clinical diagnostics.
We all want to know how our brains work but it seems it doesn’t want to be understood. Neural data is spare, incredibly diverse, incomplete and, encompassing multiple modalities, species, and brain regions. Regular statistical methods to analyze these data are overwhelmed. It seems that deep learning might be the solution. On the other way, deep learning requires extensive and uniform data sets, and like mentioned, neural data is not. What strategies or innovations have been most effective in bridging this gap?
It's worth noting, that deep learning doesn't necessarily require highly uniform data, thanks to foundation models — models pre-trained in a self-supervised manner on vast amounts of data. These models are often trained on diverse datasets.
Also, if you consider generative pre-trained language models, like ChatGPT and other LLMs – they are trained on large text corpora from the internet. This data is only as homogeneous as internet text itself, which is quite varied. However, despite the diversity in online text or images, the underlying data structures are consistent.
What I mean by that is that even though different websites will cover different topics, for example, they all use the same lexicon of words of a certain language. Likewise, when you train a model on images taken from the internet, all of the images are sitting in the same space. That is the space of potential pixels in RGB colour.
Where neuroscience data is uniquely challenging for deep learning is that you've got a situation where you're taking data from different brains that do not have the same ensemble of neurons. Moreover, you're using recording devices that are listening to different neurons within the brain and which receive signals that are a different mixture of those neurons. If you're doing electrophysiology, you're getting a different form of signal from these neurons than you are if you're doing fMRI (Editor's note: functional magnetic resonance imaging).
The challenge that we face with neuro data is not that the data is heterogeneous per se in terms of its content because that's already something that deep learning can handle with foundation models. The problem is that the way the data is expressed is heterogeneous. It would be like trying to train a model on data that's been taken from the internet but every website has its own vocabulary of words, its own unique vocabulary of words. This is the challenge we face with neural data. Now, how have people started to try to attend to this?
It’s still an unresolved problem: My group in collaboration with Eva Dyer's Group (Georgia Tech - Editor's Note) developed techniques for tokenizing.

Our interview partner Blake Richards
Blake Richards
Blake Richards is an associate professor at the School of Computer Science and the Department of Neurology and Neurosurgery at McGill University. He is also a core academic member of Mila – Quebec Artificial Intelligence Institute. Richards' research focuses on the intersection of neuroscience and artificial intelligence. His lab explores the universal principles of intelligence that apply to both natural and artificial agents.
He has received several prestigious awards, including the NSERC Arthur B. McDonald Fellowship in 2022, the Canadian Association for Neuroscience Young Investigator Award in 2019, and a Canada CIFAR AI Chair in 2018. From 2011 to 2013, Richards was a Banting Postdoctoral Fellow at SickKids Hospital.
Richards earned his PhD in neuroscience from the University of Oxford in 2010 and his BSc in cognitive science and AI from the University of Toronto in 2004.

How does that work?
Tokenization is about transforming data, like words or pixel values, into vectors that neural networks, especially transformer models, can process. Currently, deep learning uses fairly simple methods for this. For example, language models often group text by basic pattern matching, based on how frequently letters co-occur. This creates text chunks that might be individual words, word pairs, or sub-word components.
In images, tokenization is even simpler—pixel values are fed through basic mathematical functions to create vectors, without any sophisticated mapping.
The challenge in neuroscience is figuring out how to tokenize complex neural data, like fMRI signals, neuron spikes, calcium traces, or EEG waveforms. Different types of neural data will likely need different tokenization techniques. Still, these methods must somehow produce vectors in a shared space to train models across multiple modalities, like electrophysiology and fMRI.
We've developed good procedures for tokenizing, spiking, and calcium data. We're working on new systems for continuous waveform signals like EEG. We have yet to develop good techniques for merging all of these different modalities.
What makes it so difficult?
The challenge lies in ensuring that tokens created for different modalities are meaningful enough for the network to recognize shared patterns between them. For instance, we have separate tokenization procedures for spiking data and calcium data. It's not immediately obvious that simply feeding both types of tokens into a neural network will allow it to learn how the calcium and spike tokens correspond to each other. More work is needed to ensure that tokens from different modalities share a common underlying meaning.
There will be almost no practical applications of neural data until we build these models.
How do these large-scale neural decoding models translate into practical applications, such as brain-computer interfaces or clinical diagnostics?
I think there are many practical applications for these models. In fact, I’d go so far as to say that there will be almost no practical applications of neural data until we build these models. Getting back to what you said at the start, the problem is that neural activity is very complex. It's highly nonlinear. It's highly stochastic. It's highly dimensional. As such, traditional statistical techniques have not over the last two decades worked to unlock any of these potential applications.
Many people have tried and failed. Yet, in principle, we know it should be possible. I think it just comes down to that, we have not had sufficient power to identify these complex statistical patterns using our traditional models. That's why moving towards large-scale deep learning systems that excel at identifying complex patterns in data will be almost a prerequisite for most of the downstream applications in neurotechnology that we can envision when using neural activity.
This includes diagnosing whether a person has a specific condition, predicting whether someone might develop a condition in the future, and determining whether a particular treatment plan will work for an individual. For example, with depression, not everyone responds well to SSRIs or antidepressants. Being able to predict in advance, based on neural activity, whether a person will respond to these drugs would be a game-changer.
Better closed-loop techniques for controlling neural activity require both recording the neural activity and predicting how it should be controlled. For example, with deep brain stimulation used to treat diseases like Parkinson's and depression, current methods are quite imprecise. Doctors place electrodes in regions of the brain that are thought to be relevant to the disorder, then try different electrical stimulation protocols—sometimes they work, sometimes they don’t. Imagine if, instead, we could record the person's neural activity and say, 'This is the region of the brain that needs stimulation, with this specific pattern.' Furthermore, we could monitor how the stimulation affects the activity and adapt as the patient improves.
Additionally, these models could power advanced brain-computer interfaces, enabling direct mental control of computers or robotic devices. This could be transformative for clinical applications, such as helping paralyzed individuals control prosthetic limbs or communicate using screen cursors. All of these applications hinge on identifying the relevant neural patterns that correspond to specific actions, conditions, or treatment responses.
What advancements do you foresee in the field of neural decoding over the next few years.
I think one of the key technological challenges right now is figuring out the best ways to tokenize data and input it into neural networks. Another challenge is developing effective self-supervised training methods. Traditional deep learning approaches typically rely on labeled data—like providing images with labels such as "dog" or "plane"—so the system learns to recognize patterns corresponding to specific categories.
In contrast, modern approaches using foundation or pre-trained generative models rely on self-supervised learning, where no labeled data is provided. Instead, the network learns patterns directly from the data itself. This is often done by predicting or filling in missing parts of the data. For example, in language models, the network might predict the next word in a sentence. In image models, masking is commonly used—parts of the image are blanked out, and the network must reconstruct the missing parts based on the visible areas.
So we have effective self-supervised techniques for language and images, but not yet for neural data. Some progress has been made, and we're actively working on it.
I think the techniques that exist are okay but they're not quite at the level we want.
This interview was conducted by Laila Oudray